Combining Transformer Embeddings with Linguistic Features for Complex Word Identification

نویسندگان

چکیده

Identifying which words present in a text may be difficult to understand by common readers is well-known subtask complexity analysis. The advent of deep language models has also established the new state-of-the-art this task means end-to-end semi-supervised (pre-trained) and downstream training of, mainly, transformer-based neural networks. Nevertheless, usefulness traditional linguistic features combination with encodings worth exploring, as computational cost needed for running such networks becoming more relevant energy-saving constraints. This study explores lexical prediction (LCP) combining pre-trained adjusted transformer different types features. We apply these over classical machine learning classifiers. Our best results are obtained applying Support Vector Machines on an English corpus LCP solved regression problem. show that can useful tasks improve performance systems.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

AI-KU at SemEval-2016 Task 11: Word Embeddings and Substring Features for Complex Word Identification

We investigate the usage of word embeddings, namely Glove and SCODE, along with substring features on Complex Word Identification task. We introduce two systems: the first system utilizes the word embeddings of the target word and its substrings as features while the other considers the context information by using the embeddings of the surrounding words as well. Although the proposed represent...

متن کامل

CLaC at SemEval-2016 Task 11: Exploring linguistic and psycho-linguistic Features for Complex Word Identification

This paper describes the system deployed by the CLaC-EDLK team to the SemEval 2016, Complex Word Identification task. The goal of the task is to identify if a given word in a given context is simple or complex. Our system relies on linguistic features and cognitive complexity. We used several supervised models, however the Random Forest model outperformed the others. Overall our best configurat...

متن کامل

AutoExtend: Combining Word Embeddings with Semantic Resources

We present AutoExtend, a system that combines word embeddings with semantic resources by learning embeddings for non-word objects like synsets and entities and learning word embeddings which incorporate the semantic information from the resource. The method is based on encoding and decoding the word embeddings and is flexible in that it can take any word embeddings as input and does not need an...

متن کامل

Combining Machine Learning with Linguistic Heuristics for Chinese Word Segmentation

This paper describes a hybrid model that combines machine learning with linguistic heuristics for integrating unknown word identification with Chinese word segmentation. The model consists of two components: a position-of-character (POC) tagging component that annotates each character in a sentence with a POC tag that indicates its position in a word, and a merging component that transforms a P...

متن کامل

Combining Contextual Features for Word Sense Disambiguation

In this paper we present a maximum entropy Word Sense Disambiguation system we developed which performs competitively on SENSEVAL-2 test data for English verbs. We demonstrate that using richer linguistic contextual features significantly improves tagging accuracy, and compare the system’s performance with human annotator performance in light of both fine-grained and coarse-grained sense distin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Electronics

سال: 2022

ISSN: ['2079-9292']

DOI: https://doi.org/10.3390/electronics12010120